[ENH] Quantized Spann Segment Writer #6397

Sicheng-Pan · 2026-02-10T20:43:06Z

Description of changes

Summarize the changes made by this PR.

Improvements & Bug fixes
- Updated a few structs for QuantizedSpannIndexWriter to facilitate segment writer
- Separated scrub and rebuild centroid logic from commit to a separate finish
- Introduces the new QuantizedSpannSegmentWriter, under the feature flag
- Updated the VectorSegmentWriter to use the new writer impl

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Sicheng-Pan · 2026-02-10T20:43:22Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-02-10T20:46:26Z

propel-code-bot · 2026-02-11T02:24:35Z

Feature-Gating Quantized SPANN Segment Writer Pipeline

This PR introduces a complete quantized SPANN segment writer/flush pipeline, wiring it through the segment manager, schema helpers, and feature gating so quantized segments can be produced and reopened safely. The effort separates finish from commit, updates index writer lifecycle management, and propagates CMEK/scrub parameters through blockfile operations while exposing the new module under the usearch feature flag.
It also refreshes supporting infrastructure (materialized log accessors, schema-derived SPANN configuration, blockfile prefetch helpers) and removes the now-unused QuantizedSpannIds struct, ensuring the vector segment writer can delegate to the quantized implementation end-to-end.

Key Changes

• Added rust/segment/src/quantized_spann.rs implementing QuantizedSpannSegmentWriter with blockfile orchestration, cluster/centroid persistence, and feature-flag exposure
• Enhanced QuantizedSpannIndexWriter with explicit finish and commit phases, block-size configuration, ID tracking, and reopen/persist tests
• Updated VectorSegmentWriter and related enums to construct, finish, and commit quantized writers while propagating errors
• Extended schema utilities (get_spann_config, new quantized constants) and segment helpers (filepaths_to_prefetch, quantized path enumeration)
• Expanded log/materialization interfaces so BorrowedMaterializedLogRecord can expose embeddings, and ensured blockfile errors wrap QuantizedSpannSegmentError
• Removed the obsolete QuantizedSpannIds struct from rust/index/src/spann/types.rs after inlining its responsibilities elsewhere

Possible Issues

• Quantized centroid/metadata files are not included in Segment::filepaths_to_prefetch, potentially slowing reopen operations
• Collections lacking explicit SPANN config will cause get_spann_config to fail, blocking quantized writer creation unless validated upstream
• apply_materialized_log_chunk treats update-without-embedding as a no-op; ensure version/delete state remains consistent
• Feature flag mismatches (enabling quantized segments without usearch) could yield missing symbol errors at compile time

This summary was automatically generated by @propel-code-bot

propel-code-bot · 2026-02-11T21:16:20Z

rust/segment/src/quantized_spann.rs

+                            QuantizedSpannSegmentError::Config(format!(
+                                "failed to parse record segment file path: {e}"
+                            ))
+                        })?;
+                        let options = BlockfileReaderOptions::new(id, prefix.to_string());
+                        let reader = blockfile_provider.read(options).await.map_err(|e| {
+                            QuantizedSpannSegmentError::Config(format!(
+                                "failed to open record segment reader: {e}"
+                            ))
+                        })?;
+                        Some(reader)
+                    }
+                    None => None,
+                },
+                None => None,
+            };
+
+            // Order matches file_path_keys: cluster[0], embedding_metadata[1],
+            // quantized_centroid[2], raw_centroid[3], scalar_metadata[4].
+            let file_ids = QuantizedSpannIds {
+                embedding_metadata_id: parsed[1].1,
+                prefix_path: prefix_path.clone(),
+                quantized_centroid_id: IndexUuid(parsed[2].1),
+                quantized_cluster_id: parsed[0].1,
+                raw_centroid_id: IndexUuid(parsed[3].1),
+                scalar_metadata_id: parsed[4].1,
+            };
+            QuantizedSpannIndexWriter::open(
+                cluster_block_size,
+                vector_segment.collection,
+                spann_config,
+                dimensionality,
+                distance_function,
+                file_ids,
+                cmek,
+                prefix_path.clone(),
+                raw_embedding_reader,
+                blockfile_provider,
+                usearch_provider,
+            )
+            .await?


[Logic] apply_materialized_log_chunk() now hard-fails whenever the materialized log record doesn’t contain an embedding inline. In production, materialize_logs() commonly hydrates embeddings from the record segment (the log itself often omits them after compaction), so legitimate AddNew/OverwriteExisting operations will now panic with QuantizedSpannSegmentError::Data. You should accept the RecordSegmentReader that’s already being passed to VectorSegmentWriter::apply_materialized_log_chunk, hydrate the record when embeddings_ref_from_log() returns None, and only error when both sources are missing. For example:

pub async fn apply_materialized_log_chunk( &self, record_segment_reader: &RecordSegmentReader<'_>, materialized_chunk: &MaterializeLogsResult, ) -> Result<(), ApplyMaterializedLogError> { for record in materialized_chunk { let embedding = match record.embeddings_ref_from_log() { Some(v) => Cow::Borrowed(v), None => Cow::Owned( record .hydrate(record_segment_reader, 1) .await? .embedding .to_vec(), ), }; self.index.add(record.get_offset_id(), &embedding).await?; } Ok(()) }

Without this fallback any replay that relies on the record segment (which is the default case) will immediately fail.

Context for Agents

`apply_materialized_log_chunk()` now hard-fails whenever the materialized log record doesn’t contain an embedding inline. In production, `materialize_logs()` commonly hydrates embeddings from the record segment (the log itself often omits them after compaction), so legitimate `AddNew`/`OverwriteExisting` operations will now panic with `QuantizedSpannSegmentError::Data`. You should accept the `RecordSegmentReader` that’s already being passed to `VectorSegmentWriter::apply_materialized_log_chunk`, hydrate the record when `embeddings_ref_from_log()` returns `None`, and only error when both sources are missing. For example: ```rust pub async fn apply_materialized_log_chunk( &self, record_segment_reader: &RecordSegmentReader<'_>, materialized_chunk: &MaterializeLogsResult, ) -> Result<(), ApplyMaterializedLogError> { for record in materialized_chunk { let embedding = match record.embeddings_ref_from_log() { Some(v) => Cow::Borrowed(v), None => Cow::Owned( record .hydrate(record_segment_reader, 1) .await? .embedding .to_vec(), ), }; self.index.add(record.get_offset_id(), &embedding).await?; } Ok(()) } ``` Without this fallback any replay that relies on the record segment (which is the default case) will immediately fail. File: rust/segment/src/quantized_spann.rs Line: 187

AddNew/OverwriteExisting requires embedding to be present as a system invariance

Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from 8bded9c to 70b715c Compare February 10, 2026 22:26

Sicheng-Pan mentioned this pull request Feb 10, 2026

[ENH] Wire up quantized writer in compaction #6399

Open

1 task

This comment has been minimized.

Sign in to view

Sicheng-Pan requested a review from sanketkedia February 10, 2026 23:26

Sicheng-Pan changed the title ~~[ENH] Quantized Spann Segment~~ [ENH] Quantized Spann Segment Writer Feb 11, 2026

Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from 15768fa to e3957b5 Compare February 11, 2026 02:22

Sicheng-Pan marked this pull request as ready for review February 11, 2026 02:23

This comment has been minimized.

Sign in to view

Sicheng-Pan added 8 commits February 11, 2026 11:07

[ENH] Quantized Spann Segment

b104249

Schema helper

dff0475

Writer and provider

cddb1ba

VectorWriter wire up

57e32ee

Simple persistence test

143da07

Provider config

77377ea

Use old provider

e07901f

Leave provider change for upstream

0a585d2

Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from e3957b5 to 0a585d2 Compare February 11, 2026 19:10

This was referenced Feb 11, 2026

[ENH] Quantized Spann Segment Reader #6405

Open

[ENH] Wire up quantized reader in new orchestrator #6409

Open

propel-code-bot bot reviewed Feb 11, 2026

View reviewed changes

Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from 22ff37f to fa35f48 Compare February 12, 2026 01:40

Prefetch metadata

b02eb80

Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from fa35f48 to b02eb80 Compare February 12, 2026 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Quantized Spann Segment Writer #6397

[ENH] Quantized Spann Segment Writer #6397

Sicheng-Pan commented Feb 10, 2026 •

edited

Loading

Uh oh!

Sicheng-Pan commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

This comment has been minimized.

propel-code-bot bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

propel-code-bot bot Feb 11, 2026

Uh oh!

Sicheng-Pan Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[ENH] Quantized Spann Segment Writer #6397

Are you sure you want to change the base?

[ENH] Quantized Spann Segment Writer #6397

Conversation

Sicheng-Pan commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

Sicheng-Pan commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 10, 2026

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

This comment has been minimized.

propel-code-bot bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

propel-code-bot bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Sicheng-Pan Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sicheng-Pan commented Feb 10, 2026 •

edited

Loading

Sicheng-Pan commented Feb 10, 2026 •

edited

Loading

propel-code-bot bot commented Feb 11, 2026 •

edited

Loading